Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define TR caching comms in ATD #353

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Define TR caching comms in ATD #353

wants to merge 1 commit into from

Conversation

aryx
Copy link
Collaborator

@aryx aryx commented Feb 19, 2025

To make things more concrete about the interface we want
with the backend.

test plan:
make

  • I ran make setup && make to update the generated code after editing a .atd file (TODO: have a CI check)
  • I made sure we're still backward compatible with old versions of the CLI.
    For example, the Semgrep backend need to still be able to consume data
    generated by Semgrep 1.50.0.
    See https://atd.readthedocs.io/en/latest/atdgen-tutorial.html#smooth-protocol-upgrades
    Note that the types related to the semgrep-core JSON output or the
    semgrep-core RPC do not need to be backward compatible!

To make things more concrete about the interface we want
with the backend.

test plan:
make
@aryx aryx requested review from a team, emjin, aaronmichaelacosta and bkettle and removed request for a team and emjin February 19, 2025 13:48
Copy link

Backwards compatibility summary:

Checking backward compatibility of semgrep_output_v1.atd against past version v1.100.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.101.0
Skipping v1.102.0 because commit 1c82453e89e0b569630e48ddde015e201df0e5f9 has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.103.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.104.0
Skipping v1.106.0 because commit 5e0c767ec323f3f2356d3bf8dbdf7c7836497d8a has already been checked
Skipping v1.107.0 because commit 5e0c767ec323f3f2356d3bf8dbdf7c7836497d8a has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.108.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.75.0
Skipping v1.76.0 because commit 9102031608aa4154e1c37f557550ec4eabc8780c has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.77.0
Skipping v1.78.0 because commit dcb5d77b420ddee61f58aadd3c2c7aef38778154 has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.79.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.80.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.81.0
Skipping v1.82.0 because commit 9e0f3bec26b07b4fb6753a32cb75277f45f2572c has already been checked
Skipping v1.83.0 because commit 9e0f3bec26b07b4fb6753a32cb75277f45f2572c has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.84.0
Skipping v1.84.1 because commit 3daef49297ada205359cc1d2996354c94b628b0d has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.85.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.86.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.87.0
Skipping v1.88.0 because commit 512c0bd97db59c48a5705b2741662a338776e438 has already been checked
Skipping v1.89.0 because commit 512c0bd97db59c48a5705b2741662a338776e438 has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.90.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.91.0
Skipping v1.92.0 because commit 2351c5e528cb7430422208dc66707894c066b508 has already been checked
Checking backward compatibility of semgrep_output_v1.atd against past version v1.93.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.94.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.95.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.96.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.97.0
Checking backward compatibility of semgrep_output_v1.atd against past version v1.98.0
Skipping v1.99.0 because commit 60809032a2e39742f42910d46b3e5dd305b8b8cf has already been checked

type tr_cache_key = {
rule_id: rule_id;
(* ex: http://some-website/hello-world.0.1.2.tgz like in found_dependency *)
resolved_url: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this will be easy to find in many cases, though I agree that if it's possible it makes the best key. I guess it is probably fine to start with this, and add another key later if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can always refine. This is just defining the interface. Once we start the implementation we will discover
we need to refine it.

* and [transitive_unreachable] records?
* TODO? make it a list? match_results: ... list; ?
*)
match_result: sca_match_kind;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit odd, since when scanning a package with a rule it will result in direct code matches, not sca matches. Maybe we should just return the match here? Then, the CLI could treat those matches the same as matches that it receives from a call to Semgrep locally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. This is a bigger data structure to store then though, and for TR what we really need is actually just the sca_transitive_match_kind; that's the thing we try to optimize to avoid downloading the dependency and run semgrep on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at least store the match locations. I don't think it makes sense to store sca_match_kind because if there are matches in multiple packages, we will somehow need to combine those into a single finding in the cli

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants